275 research outputs found
On the Hardness and Inapproximability of Recognizing Wheeler Graphs
In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a way which is space efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present:
- The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA\u27s for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA\u27s where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1;
- There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time;
- We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation;
- We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1);
The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial time solvable, raising the open question of which parameters determine this problem\u27s difficulty
Compressibility-Aware Quantum Algorithms on Strings
Sublinear time quantum algorithms have been established for many fundamental
problems on strings. This work demonstrates that new, faster quantum algorithms
can be designed when the string is highly compressible. We focus on two popular
and theoretically significant compression algorithms -- the Lempel-Ziv77
algorithm (LZ77) and the Run-length-encoded Burrows-Wheeler Transform (RL-BWT),
and obtain the results below.
We first provide a quantum algorithm running in time
for finding the LZ77 factorization of an input string with
factors. Combined with multiple existing results, this yields an
time quantum algorithm for finding the RL-BWT encoding
with BWT runs. Note that . We complement these
results with lower bounds proving that our algorithms are optimal (up to
polylog factors).
Next, we study the problem of compressed indexing, where we provide a
time quantum algorithm for constructing a recently
designed space structure with equivalent capabilities as the
suffix tree. This data structure is then applied to numerous problems to obtain
sublinear time quantum algorithms when the input is highly compressible. For
example, we show that the longest common substring of two strings of total
length can be computed in time, where is the
number of factors in the LZ77 factorization of their concatenation. This beats
the best known time quantum algorithm when is
sufficiently small
Conformal blocks on smoothings via mode transition algebras
Here we define a series of associative algebras attached to a vertex operator
algebra , called mode transition algebras, showing they reflect both
algebraic properties of and geometric constructions on moduli of curves.
One can define sheaves of coinvariants on pointed coordinatized curves from
-modules. We show that if the mode transition algebras admit multiplicative
identities with certain properties, these sheaves deform as wanted on families
of curves with nodes (so satisfies smoothing). Consequently, coherent
sheaves of coinvariants defined by vertex operator algebras that satisfy
smoothing form vector bundles. We also show that mode transition algebras give
information about higher level Zhu algebras and generalized Verma modules. As
an application, we completely describe higher level Zhu algebras of the
Heisenberg vertex algebra for all levels, proving a conjecture of
Addabbo--Barron.Comment: 56 page
Algorithms and Lower Bounds for Ordering Problems on Strings
This dissertation presents novel algorithms and conditional lower bounds for a collection of string and text-compression-related problems. These results are unified under the theme of ordering constraint satisfaction. Utilizing the connections to ordering constraint satisfaction, we provide hardness results and algorithms for the following: recognizing a type of labeled graph amenable to text-indexing known as Wheeler graphs, minimizing the number of maximal unary substrings occurring in the Burrows-Wheeler Transformation of a text, minimizing the number of factors occurring in the Lyndon factorization of a text, and finding an optimal reference string for relative Lempel-Ziv encoding
On the Complexity of BWT-Runs Minimization via Alphabet Reordering
The Burrows-Wheeler Transform (BWT) has been an essential tool in text
compression and indexing. First introduced in 1994, it went on to provide the
backbone for the first encoding of the classic suffix tree data structure in
space close to the entropy-based lower bound. Recently, there has been the
development of compact suffix trees in space proportional to "", the number
of runs in the BWT, as well as the appearance of in the time complexity of
new algorithms. Unlike other popular measures of compression, the parameter
is sensitive to the lexicographic ordering given to the text's alphabet.
Despite several past attempts to exploit this, a provably efficient algorithm
for finding, or approximating, an alphabet ordering which minimizes has
been open for years.
We present the first set of results on the computational complexity of
minimizing BWT-runs via alphabet reordering. We prove that the decision version
of this problem is NP-complete and cannot be solved in time unless the Exponential Time Hypothesis fails, where is the
size of the alphabet and is the length of the text. We also show that the
optimization problem is APX-hard. In doing so, we relate two previously
disparate topics: the optimal traveling salesperson path and the number of runs
in the BWT of a text, providing a surprising connection between problems on
graphs and text compression. Also, by relating recent results in the field of
dictionary compression, we illustrate that an arbitrary alphabet ordering
provides a -approximation.
We provide an optimal linear-time algorithm for the problem of finding a run
minimizing ordering on a subset of symbols (occurring only once) under ordering
constraints, and prove a generalization of this problem to a class of graphs
with BWT like properties called Wheeler graphs is NP-complete
The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance
We present the first fine-grained complexity results on two classic problems on strings. The first one is the k-Median-Edit-Distance problem, where the input is a collection of k strings, each of length at most n, and the task is to find a new string that minimizes the sum of the edit distances from itself to all other strings in the input. Arising frequently in computational biology, this problem provides an important generalization of edit distance to multiple strings and is similar to the multiple sequence alignment problem in bioinformatics. We demonstrate that for any ? > 0 and k ? 2, an O(n^{k-?}) time solution for the k-Median-Edit-Distance problem over an alphabet of size O(k) refutes the Strong Exponential Time Hypothesis (SETH). This provides the first matching conditional lower bound for the O(n^k) time algorithm established in 1975 by Sankoff.
The second problem we study is the k-Center-Edit-Distance problem. Here also, the input is a collection of k strings, each of length at most n. The task is to find a new string that minimizes the maximum edit distance from itself to any other string in the input. We prove that the same conditional lower bound as before holds. Our results also imply new conditional lower bounds for the k-Tree-Alignment and the k-Bottleneck-Tree-Alignment problems studied in phylogenetics
- …